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Part  2:  STIFFNESS  MATRICES;  A  CASE  STUDY 
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Abstract 


^  The  stripe  structures  of  stiffness  matrices  resulting 
from  •'■Irtecjiula^  domains  covered  by  regular  grids  are 
analysed;,  It  is- jpiSrgyed  that  the  non-zero  elements  in  these 
mate  ices  may  be  ^i^ered  by  very  few  stripes,  and  that  these 
strips^  may  be  nori-^^er lapping,  if  the  nodes  of  the  grids 
are  numbered  appropr iatly.  The  exact  number  of  stripes, 
which  is  independent  of  the  size  of  the  problem,  is  derived 
for  different  types  of  grids,  and  different  numbering 
schemes.  The  stripe  structures  of  some  irregular  grids  are 


also  examined. 


•)  This  work  is,  in  part,  supported  under  ONR  contract 
N00014-85-K-0339. 

**)  On  leave  from  the  Dept,  of  Computer  Science,  Purdue 
Univ.,  West  lafayette,  IN  47907. 


1.  INTRODUCTION 


Many  techniques  have  been  suggested  for  the  efficient 
solution  of  sparse  linear  systems;  they  Involve  often  highly 
Irregular  storage  schemes  aund  manipulation  algorithms  for 
the  non-zero  elements  In  the  matrix  [4].  Although,  these 
techniques  lead  to  very  powerful  sequential  Implementations 
(see  e.g.  [3]),  they  are  not  at  all  suitable  for  parallel 
architectures.  In  fact,  parallel  processing  requires.  In 
general,  a  rather  regular  pattern  of  computation  In  order  to 
minimize  data  conflict  and  communication  delays. 

In  Part  1  of  this  presentation  (see  [6]),  a  method  Is 
Introduced  for  representing  all  non-zero  elements  of  a 
sparse  matrix  In  a  stripe  structure  that  provides.  In  some 
sense,  a  compromise  between  efficiency  and  regularity.  More 
specifically,  the  stripe  structure  Is  shown  to  possess 
enough  regularity  to  allow  for  the  design  of  some  efficient 
networks  for  the  parallel  manipulation  of  sparse  matrices. 
Two  networks,  namely  MAT/VEC  for  the  multiplication  of  a 
matrix  by  a  vector,  aund  TRIANG,  for  the  solution  of  trlauigu- 
lar  systems,  are  given  as  examples. 


Very  briefly,  a  stripe  of  an  nxn  matrix  A  is  a  set 
of  positions  that  contains  at  most  one  position,  (l,j),  of  A 
for  every  row  1;  that  is,  Sj^  »  ((i,Oj^(i))  :  ielcl^},  where  ^ 


I  ^ 


□ 

□ 


I^-{1, . . .  ,n} ,  and  Is  a  strictly  increasing  function.  Two 
stripes  Sj^  amd  are  ordered  by  if,  for  any  1  auid  j 
In  the  domains  of  o^  and  a  ,  respectively. 


fy  Codes 


vy  / 


j>.d/or 

peciai 


(1) 


i  *  j  implies 


A  stripe  structure  of  the  matrix  A  is  then  defined  as  a 
disjoint  union  of  stripes  S^,  'k.’*!, . . .  ,n ,  which  satisfies 


contains  all  the  non  zero  elements  of  A. 


More 


specifically,  if  a.  then,  there  should  exist  a  unique 
k,  such  that  (i,j)eS^.  Also,  the  stripes  are 
said  to  be  non-overlapping  if 


for  any  integers  k,  i  and  m  such  that  (i,o^(x))eS^  and 
(i-m,Oj^^jjj(i-m)  If  the  inequality  in  (2)  is  strict, 
then  the  stripes  are  called  strictly  non-overlapping. 


The  linear  network  MAT/VEC  suggested  in  [6]  for  the 
multiplication  of  a  matrix  A  by  a  vector  x  consists  of  n 
cells,  where  n  is  the  number  of  stripes  in  the  representa¬ 
tion  of  A  (called  the  stripe  count).  Every  two  consecutive 
cells  k  and  k+1  in  MAT/VEC  are  connected  by  two  unidirec¬ 
tional  communication  links,  where  a  link  is  regarded  as  a 
queue  that  may  buffer  data  between  cells  k  autd  k-t-1.  One  of 
the  links  is  directed  from  k-M  to  k  and  transmits  the  ele¬ 
ments  of  the  input  vector  x,  and  the  other  is  directed  from 
k-t-1  to  k  and  transmits  the  elements  of  the  result  vector 
y->Ax.  The  network  is  data  driven  in  the  sence  that  the 
operation  of  each  cell  is  initiated  by  the  availability  of 
its  input . 


In  order  to  estimate  the  running  time  of  MAT/VEC,  it  is 
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assumed  that  the  execution  alternates  between  two  phases, 
namely  a  comnunlcatlon  phase,  amd  a  processing  phase.  In 
[6],  It  Is  shown  that  If  the  Input  matrix  to  MAT/VEC  has 
non-zero  diagonal  elements,  aind  non-overlapping  stripes, 
then  no  data  conflict  occurs,  and  the  execution  terminates 
In  n  global  cycles,  where  a  global  cycle  consists  of  a  com¬ 
munication  phase  followed  by  a  processing  phase.  The  time 
for  a  processing  phase  Is  roughly  that  of  one  floating  point 
operation,  while  the  time  for  a  communication  phase  depends 
on  the  slopes  of  the  stripes  of  the  matrix. 

In  this  paper,  we  consider  a  major  source  of  large 
sparse  matrices;  namely,  finite  elements  and  finite  differ¬ 
ence  discretizations  of  partial  differential  equations 
(PDE).  More  specifically,  we  study  the  stripe  structure  of 
stiffness  matrices  that  result  from  discretizations  on 
Irregular  domains  using  regular  grids.  First,  we  specify  In 
Section  2  the  types  of  domains  and  grids  used  In  the  study. 
Then,  In  Section  3,  we  show  that  for  matrices  resulting  from 
these  types  of  grids,  a  stripe  structure  with  very  few 
stripes  may  be  Introduced,  but  the  resulting  stripes  do.  In 
general,  overlap.  In  order  to  obtain  non -over  lapping 
stripes,  we  suggest  ,  In  Section  4,  a  multicolor  numbering 
scheme  that  spreads  the  stripes  within  a  matrix,"*  and  thus 
disengages  any  overlap  between  stripes.  The  multicolor 
numbering  Is  shown.  In  Section  5,  to  decrease  the  maximum 
separation  between  stripes,  which  minimizes  the  number  of 
data  Items  that  should  be  buffered,  at  any  give  time,  on  the 


Let  Q  be  a  rectangular  domain  that  is  covered  by  a  grid 
Mg  with  lines  parallel  to  the  sides  of  Q.  If  we  remove  from 
Q  any  number  of  rectangular  subdomains  whose  boundaries 
coincide  with  some  lines  in  Mg,  then  we  obtain  a  new  domain 
n  c  Q,  which  we  will  call  a  pierced  recterngular  domain.  The 
part  of  Mg  that  covers  n  is  denoted  by  M^  euid  is  called  a 
pierced  rectangular  grid  (see  Pig  1(b)). 


(c)  an  irregular  domain  (d)  an  irregular  grid  M^^ 

covered  by  isomorphic  to  M^^ 

Pig  1  -  Examples  of  finite  elements  grids 


If  D  is  an  irregular  domain,  we  may  approximate  D  by  a 
pierced  rectangular  domain  and  cover  it  by  a  pierced  rec¬ 
tangular  grid  (see  Pig  1(c)).  Another  alternative  (usually 


used  In  automatic  grid  generation),  is  to  map  0,  iso- 
parametr ically  [9],  into  a  pierced  rectauigular  domain  n. 


cover  n  by  a  pierced  rectangular  grid  auid  then  map 
back  to  a  grid  that  covers  O  (see  Fig  1(d)).  In  this 
case,  the  zero  pattern  o£  the  stiffness  matrix  that  results 
from  the  discretization  of  a  PDB  on  is  the  same  as  that 
resulting  from  the  discretization  of  the  PDE  on  For 

this  reason,  we  consider  here  only  discretizations  on 
pierced  rectangular  grids. 

Given  a  pierced  rectangular  domain  H  covered  by  a  gr. 
that  contains  n^  nodes,  let  be  a  rectcmgular  grid  tht 
includes  and  contains  n^  nodes,  Each  node  in  H 

may  be  Identified  by  a  unique  number  X,  l«fX«ng,  assigned  to 
it  by  some  numbering  scheme  (greek  letters  will  be  used  to 
identify  nodes  in  Mg) .  On  the  other  hauid,  the  nodes  in 
may  be  renumbered  such  that  each  node  XeM^  is  assigned  a 
unique  number  i-v(X),  l^i^n^. 

Def  in  it  ion  i.:  A  renumbering  of  is  said  to  be  deduced  from 
the  numbering  of  Mg  if  the  number  i-v(X),  assigned  to  ouiy 
node  XeM^  is  derived  as  follows: 

1-0 

For  X  -  1, . . . ,ng  Do 

If  X€Mj^  Then  {  1-1+1  ;  i/(X)  -  1  } 

Else  {  v(X)  is  undefined  } 

Clearly,  the  renumbering  function  v  satisfies  the  fol¬ 
lowing  relations: 


\  ?  fl  <**>  v(X)  >  V{Jl) 

\  fl  *a>  \-fi  i  v(X)-V(/i) 


(3. a) 
(3.b) 


The  inverse  v  ^  of  the  function  v  will  be  used  to  map 
the  number  i  of  any  node  in  into  its  identity  X-»v~^(l)  in 
Mq.  It  is  also  useful  to  define  a  function  which  deter¬ 
mines,  for  each  node  t.he  smallest  node  larger  thoui  0 
that  is  in  For  uniformity,  we  define  such  a  function  for 
any  as  follows: 


Definition  2: 


Next(O) 


The  function  Next(&)  :  Mg-*MQ 
,  min{M  :  ouid 


is  defined  by: 

if  such  n  exists 
otherwise 


Note  that  the  minimum  does  not  exist  if  every  node  m-0  is 
not  in  which  may  happen  only  if  0>v~^(np).[] 


Without  entering  into  the  details  of  the  generation  of 
stiffness  matrices,  we  just  mention  that  the  matrix  A  gen¬ 
erated  from  the  discretization  of  a  PDE  on  is  an 
matrix  in  which  each  row  1  corresponds  to  a  node  X>v~^(i)  in 
The  only  non  zero  elements  in  row  i  of  A  are  those  at 
positions  (i,m),  where  n^v  ^(m)  is  a  node  that  is  a  neighbor 
to  node  X  in  The  definition  of  neighboring  nodes 

depends  on  the  specific  discretization  used.  For  exounple, 
in  finite  element  discretizations,  two  nodes  are  neighbors 
if  there  exists  aui  element  that  contains  the  two  nodes. 


Prom  the  aUsove  discussion.  It  Is  clear  that  the  scheme 


used  to  number  the  nodes  determines  the  zero  structure  of 


the  matrix  A.  In  the  following  subsections  we  consider  two 
different  numbering  schemes.  For  ease  of  reference,  we 
refer  to  the  5-pointa  star  finite  difference  discretization 
by  FDg,  and  to  finite  elements  discretizations  with  3-nodes 
triangles,  6-nodes  triangles,  4-nodes  rectangles,  auid  9- 
nodes  recteuigles  by  FE„ ,  FE-,  FK. , 


and  FEq,  respectively. 


1.  REGULAR  tlQDE  NOMBERIMG 


A  regular  node  numbering  is  one  in  which  the  nodes  are 
numbered  sequentially,  column-wise  or  row  wise.  We  will  con¬ 
sider  only  column-wise  numbering  and  note  that  our  results 
apply  to  row-wise  numberings,  as  well. 

Let  Mg  contains  H  horizontal  lines  and  W  vertical 
lines,  and  identify  each  node  in  Mg  by  the  number  assigned 
to  it  by  the  column -wise  numbering  of  Mg,  that  is,  identify 
the  node  located  at  the  intersection  of  the  i^^  horizontal 
line  and  the  vertical  line  of  Mg  by  the  integer  (j- 
l)H+i.  It  is  easy  to  see  that  the  column -wise  numbering  of 
M^  is  the  one  deduced  from  the  above  numbering  of  Mg.  Let  v 
be  the  renumbering  function  introduced  in  Definition  1. 


Depending  on  the  specific  discretization,  we  may  intro¬ 
duce  few  functions  that  define  the  neighbors  of  each  node  X. 
in  Mg.  For  example,  for  FE^  discretization,  the  following 
nine  neighboring  functions  may  be  defined  for  each  XeMg  (see 

Pig  2) : 


X-H+1  X+1  X+H+1 


X-H-1  X-1  X+H-1 


Fig  2  -  the  neighbors  of  node  X 
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S' 


♦r«i 


•a 


P.4(M 

P_3(X) 

pJ(X) 

Pnrx) 


H  -  1 
H 

H  +  1 
1 


P4(X) 

pJ(X) 

P2(X) 

P^CX) 


X  +  H  +  1 
X  +  H 
X  +  H  -  1 
X  +  1 


Similar  neighboring  functions  may  be  defined  for  other 


discretizations,  and  then  used  to  determine  the  stripe 


structure  of  the  corresponding  matrix  as  illustrated  by  the 


following  theorem: 


Theorem  J.:  Let  the  numbering  of  M  be  deduced  from  that  of 


Mg,  and  let  A  be  the  stiffness  matrix  that  results  from  a 


specific  discretization  of  a  POE  on  (with*  a  specific 


definition  of  neighboring  nodes).  If  there  exists  rr  func¬ 


tions  p^:Mg-*MQ,  Ic»l,...,iT  such  that  for  any  two  neighboring 


nodes  X  and  p  in  Mg,  p^(X)»p,  for  some  k,  l«fka7r,  and  the 


functions  satisfy 


Pv(X)  <  Pv(X+l) 
PQ(X)  <  p;^,,(X) 


k»l, . . . ,F 
k»l,  .  .  . , rr-1 


(5. a) 
(5.b) 


Then,  it  is  possible  to  construct  a  stripe  structure  for  A 


with  stripe  count  n. 


Proof:  Define,  for  k-l,...,7r,  the  following  sets: 


Sj^  -  and  i} 


where 


-  It 


f  v(Pj^(v‘^(i))) 


if  Pj^(V  ^(D)  € 


otherwise 


Where  i  and  t  are  used  for  "defined",  autd  "not  defined". 


•iffi 


respectively.  It  is  readily  seen  that  if  the  (ifin)  ele¬ 
ment  of  A  is  non  zero,  then  nodes  and  v”^(m)  are 

neighbors,  and  there  exists  a  k  such  that  v~^(m) v”^( i) ) . 
Thus,  (i,m)6Sj^.  In  other  words,  every  position  of  A  that 
has  a  non-zero  element  is  in  some  set  S^,  . 

In  order  to  prove  that  each  set  S^,  l-^k^v,  is  a  stripe, 
we  consider  any  two  elements  (l,aj^(i))  and  (m,a^(m))  in  S^. 
If  l=v(X)  auid  m“v(At),  then  by  the  definition  of  Oy^,  both 
Pj^(X)  auid  hence  both  v(Pj^(X))  and 

v(p^(p))  are  defined.  Now  if  1  >  m,  then  from  (3. a)  X  >  fi 
and  from  (5. a)  Pj^(X)  >  Pj^(P)  •  Thus  v(Pj^(X))  >  v(p^(M)),  that 
is  ^  »  which  proves  that  is  a  strictly 

increasing  function  and  that  is  a  stripe. 

Finally,  we  need  to  show  that  For  this,  we 

consider  the  two  elements  (l,aj^(i))  e  Sj^,  emd  (m,aj^^^(m))  € 
Following  the  same  steps  as  above,  we  may  show  that 


if  i  ^  m,  then  X  ^  p  and  ^ 


But  from 


(5.b),  >  Pj^(*t)f  which  leads  to  •  [1 


Note  that  the  above  theorem  does  not  depend  on  the 
specific  numbering  of  Mg.  For  column  wise  numbering,  the 
functions  (4)  may  be  used  (assuming  H  >  2)  to  prove  the  fol¬ 
lowing: 

Corollary  1:  If  the  nodes  in  a  pierced  rectaungular  grid 
are  numbered  column-wise,  then  the  matrix  that  results  from 
FE^  on  is  a  striped  matrix  with  stripe  count  9  []. 


Given  that  the  matrix  resulting  from  FE^  on  Mg  has  nine 
parallel  stripes  [6],  Corollary  1  proves  that  piercing  Mg 
end  renumbering  the  nodes  do  not  change  the  stripe  count  of 
the  matrix  (however,  the  stripes  are  no  longer  parallel). 

Results  similar  to  Corrolary  1  may  be  proved  for  other 
discretizations  (see  table  1  for  a  summary).  Although 
these  results  indicate  that  the  network  MAT/VEC  may  be  used 
with  the  corresponding  stiffness  matrix,  they  do  not  guaran¬ 
tee  that  the  stripes  of  the  matrix  are  non  overlapping,  and 
thus,  that  the  operation  of  MAT/VEC  is  not  delayed  due  to 
internal  data  conflict.  For  example,  the  matrix  shown  in 
Fig  3(b),  which  has  overlapping  stripes,  is  obtained  from 
the  column  wise  numbering  of  the  pierced  rectangular  domain 
shown  in  Pig  3(a). 


FDs 

PE3 

PE^ 

PEg 

regular 

3 -col  or 
5-color 

5 

7(N0) 

X 

7 

9  (NO) 

X 

9 

ll(NO) 

X 

19 

23 

23 (NO) 

25 

29 

29 (NO) 

*)  NO  -  Not  overlapping  if  Lemma  2  applies 


Table  1  -  Stripe  count  for  different  numbering  schemes 


(a)  The  grid 


(b)  The  matrix 


Pig  3  -  column-wise  numbering 
A.  MiiLXL -COLOR  NODE  MUMBERIMG 

Many  multicolor  numbering  schemes  have  been  used  by 
different  authors  to  obtain  stiffness  matrices  that  have 
some  desirable  properties  (see  e.g.  [1^7 ^8]).  In  this  sec¬ 
tion,  we  Introduce  a  multicolor  scheme  that  spreads  apart 
the  stripes  of  A  such  that  they  do  not  overlap,  we  consider 
only  3-color  numbering,  and  we  assume  that  H»3h-1,  for  some 
integer  h.  This  may  be  satisfied,  always,  by  increasing  the 
height  of  Mg  appropriately. 

In  order  to  explain  the  3-color  numbering  scheme,  we 
assume  that  each  horizontal  line  in  is  given  a  color. 
Namely,  lines  1,4, . .  .jj3h-2  are  white,  lines  2,5,...,3h-l  are 
black,  and  lines  3,6,...,3h-3  are  red.  Numbers  are,  then, 
assigned  to  the  nodes  in  Mg  as  follows: 

For  each  column  j-l,...,w  Do 

1)  number  the  nodes  in  the  white  lines  of  column  j. 
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1  ^ 


I 


Pcd) 

Pi(X) 

pJ(X) 

pU\) 

pf(X) 

pJ(X) 


X 

X 

X 

X 

X 

X 


5h 

4h 

3h 

2h 

h 


2 

1 

1 

1 


pjcx) 

P.3(X) 

pjcx) 


X 

X 

X 

X 

X 


5h  + 
4h  + 
3h  + 
2h  + 
h 


2 

1 

1 

1 


(7) 


k-2h+l 


X+h 


X+4h-l 


X-3h*l 


X-2h+l 


X+h 


X+3h-l 


X-3h+l 


X-h 


X+2h-l 


x+5h-2  x-4h«l 


X+4h-l  X-Sh+2 


X-2h-fl 


X+3h-l  X-3h+l 


X+h 


X+3h-l 


X-h 


X+2h-l  X-4h+l 


X-h 


X+2h-l 


(a)  X  is  v^ite 


(b)  X  Is  black 


(c)  X  is  red 


Fig  5  -  The  neighbors  of  X. 


If  hi2  (H^5),  then  the  functions  (7)  satisfy  the  condi¬ 
tions  of  Theorem  1,  and  hence ,  the  resulting  matrix  may  be 
covered  by  eleven  stripes,  which  is  more  than  the  number  of 
stripes  resulting  from  the  regular  numbering.  However,  the 
stripes  in  this  case  are  non  overlapping  provided  that 
does  not  have  very  narrow  regions.  This  condition  on  is 
better  phrased  in  the  following  Lemmas: 


r.aimiia  j.:  Assuming  3 -color  node  numbering  and  4 -nodes  quadri¬ 
lateral  elements,  if  each  column  in  contains  at  least 
four  elements  that  are  either  contiguous  or  divided  into  two 
groups  of  two  contiguous  elements  each,  then 


Next(O)  -  0  e  h  -  2 


for  any  0  e  M. 


(8) 


where  the  function  Next  is  as  given  In  Definition  2. 


Proof:  For  ease  of  reference,  we  indicate  the  position  of  a 


node  that  lies  on  the  Intersection  of  the  horizontal 
line,  l<z<3h-l,  and  the  vertical  line,  l<v<W,  of  Mg  by 
the  pair  (z,v).  If  then  Mext(0)-0  and  (8)  is  trivial. 
Hence,  let  0  ^  be  at  position  (z,v)  and  have  the  color  R. 
Let  also  R1  be  the  color  that  follows  R,  that  is  Rl«  black, 
red  or  white,  if  R-  white,  black  or  red,  respectively.  From 
the  hypothesis,  the  v^^  column  of  elements  in  contains 
either  four  contiguous  elements  or  two  pairs  of  contiguous 
elements.  That  is  there  exists  two  horizontal  lines  a  and  b 
with  bka*f2  such  that  all  the  nodes  at  positions  (c,v)  auid 


(c,v-»-l),  for  a4c^a+2  auid  beceb*f2,  are  in  (see  Fig  6). 
Clearly,  we  may  have  one  of  three  cases: 


Case  (1) 


Case  (2) 

Fig  6  -  Column  v  of 


Case  (3) 


Case  1;  z  <  a:  In  this  case,  a  node  MeM^  with  color  R  should 


exist  at  a  position  (c,v),  aecea-t'2,  and  fi-0-1  •  the  number 
of  lines  with  color  R  between  lines  z  and  a.  In  other 
words,  ia  less  than  the  number  of  lines  with  color  R 
below  line  a.  Since  this  number  is  »  the  largest 
integer  less  thaui  and  given  that  b-t-2«3h-l  and  aab-2,  we 


obtain  •  h-2.  By  definition,  Next(0)«ix,  and 


hence  Next(0)-0<h-2. 


Case  2;  a  <  z  <  b:  In  this  case  a  node  with  color  R 

should  exist  at  a  position  (c,v),  b«c^b+2,  and  fi-G-1  is  less 
than  the  number  of  lines  with  color  R  between  lines  a+2  and 
b,  that  is  less  thorn  +1*  Since  b+2<3h-l  and  a^l, 

then  fi-O-K  l^^^J+1  -  h-2.  That  is,  Next(0)-0  ^  fi-Q  «h-2. 

Case  3;  z  >  b+2:  If  v»W  and  R>red,  then,  there  is  no  nodes 
in  larger  than  0.  Hence  Next(0)»nQ,  cuid  Next(0)-0  -  the 

number  of  lines  of  color  R  above  line  z  = 
z^b+3^a+5s»6,  which  gives  Next(0)-0^h-3 . 

On  the  other  hauid,  if  v<W,  or  R/red,  then  a  node 
with  color  R1  should  exist  at  a  position  (c,v+l)  if  R-«red  or 
at  a  position  (c,v)  if  R^white  or  black,  where  a«c^a+2. 
Here,  fi-6-1  »  [the  number  of  lines  of  color  R  above  line  z] 
-t-  [the  number  of  lines  of  color  R1  below  line  a] .  But  at 
most  lines  with  color  R  may  be  above  line  z,  and  at 

most  lines  with  color  R1  may  be  below  line  a.  Hence 

fL-O-l  *  ^  ^  ^^3h.-ZrZ±a|  Given  that  z»a+5, 

then  tL-0-1  <  h-3,  which  gives  Next(0)-0<h-2.  [  ] 

r-eimiia  2:  Assuming  3-color  node  numbering  auid  4-nodos  quadri¬ 
lateral  elements.  If  each  column  in  contains  at  least 
seven  contiguous  elements  or  two  separated  groups  of  at 
least  three  contiguous  elements  each,  then 

Next(O)  -  0  *  h-3  for  euiy  0  eM^  (9) 

Sketch  of  the  proof:  Let  be  at  position  (z,v).  Then 

from  the  hypothesis,  there  exists  two  horizontal  lines  a  aind 


b,  b  >  a-i-3  such  that  all  the  nodes  at  positions  (c,v)  emd 
(c,v+l),  for  a«c*«a+3  auid  b<c^b+3  are  in  The  rest  of  the 
proof  proceeds  in  a  way  similar  to  the  proof  of  Lemma  1.  [] 


Lemma  If  for  ouiy  0  €  Next  (0)  -  0  «  p,  then, 

v"^(i+l)  -  v"^(i)  ^  p  +  1  for  i  -  l,...,nj^-l  (10) 

Proof:  let  i«v(X)  and  l+l>v(X).  Given  that  X  is  numbered 
right  after  X,  then  any  node  0  with  X<0<X  is  not  in  and 
hence,  Next(0)-X.  But,  from  the  hypothesis,  X-0<p,  and 
hence,  if  X>X+1,  then  X-X<X-0ap.  That  is  X-X^p+1.  (] 


The  condition  (10)  may  be  trauislated  to  an  upper  limit 

on  the  number  of  columns  that  a  stripe  in  a  stiffness  matrix 

may  jump  in  one  row.  More  specifically,  if  a^^  ^  auid 

)c 

®^i+l  a  (£+1)  same  stripe,  then  condition  (10) 

limits  the  value  of  a^(  i+l)-aj^(l) .  This  may  bo  used  to 
prove  that  the  stripes  will  not  overlap  if  they  are  ade¬ 
quately  separated  from  each  other. 

Theorem  2:  Let  the  nodes  in  a  pierced  rectangular  grid  be 

numbered  using  the  3 -color  numbering  scheme,  auid  let  A  be 

the  matrix  that  results  from  an  PE.  discretization  on  M.. 

4  n 

If  satisfies  the  conditions  of  Lemma  1,  then  A  has  eleven 
non  overlapping  stripes.  Moreover,  if  M^  satisfies  the  con¬ 
ditions  of  Lemma  2,  then  the  eleven  stripes  are  strictly  non 
over lapp ing . 


Proof  tConsider  the  eleven  functions  given  by  equs  (7).  It 
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i 


is  straight  forward  to  check  that 

Pi-(X+1)  -  -  1  k— 5,...,5  (11. a) 

pj^^.l(l)  -  pJ;(X)  h-1  k— 5,...,4  (11. b) 

Clearly,  these  functions  satisfy  the  conditions  in  Theorem 

1,  and  hence,  A  has  eleven  stripes  k»-5,...5  of  the  form 

Sk  -  {(A,aj^(i))  ;  l-l’‘nj^  and  aj^(i)  i}  (12) 

where 

v(pj^(v‘^(i)))  if  Pj^(v"^(i))  6 

"it  otherwise 

In  order  to  prove  that  these  stripes  do  not  overlap,  we 
consider  any  integers  i,  m  and  k  such  that  (i,a^(l))6S^  and 
(i-m,Oj^^^(l-m))€Sj^^^,  and  we  let  i-m  *  v(X)  atnd  i  -  v(X). 
From  (12),  we  get 

W*-"*)  -  V")  -  -  ’'(Pk(">>  (^3) 

But  from  (ll.b)  P]^^^(^)  ”  ^k^^^  ^  m(h-l) ,  auid  from  (ll.a), 
Pk(X)  -  Pj5(^)  -  (X-X).  That  is 

PkT-m^^^  -  P,^(X)  a  m(h-l)  -  (X-X) 


Now,  if  the  conditions  of  Leimna  1  are  satisfied,  then  from 
(8)  auid  (10),  X-Xam(h-l)  and  thus  P]^+nj(X)-Pj^(X)^0.  By  pro¬ 
perty  (3)  of  the  renumbering  function  and  equation  (13),  we 
finally  obtain  ^  which  is  the  condition  (2) 
for  non  overlapping  stripes.  On  the  other  hauid,  if  the  con¬ 
ditions  of  Lemma  2  are  satisfied,  then  from  (9)  and  (10), 
X-X<m(h-1) .  This  leads  to  ^  <7^(4),  which  is,  the 
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condition  for  strictly  non  overlapping  stripes.  [] 


The  result  of  Theorem  2  proves  that  if  an  nxn  stiffness 
matrix  generated  by  FE^  auid  3-color  numbering  is  used  as 
input  to  MAT/VEC,  then,  execution  terminates  in  n  global 
cycles.  In  fact,  assuming  that  each  y-stream  communication 
lino  in  MAT/VEC  may  buffer  only  one  data  item,  the  progres¬ 
sion  of  the  execution  may  be  described  by  the  following  com¬ 
putation  fronts: 


^^t"^^t-k,o^(t-k) 


-5<k<5  and  Oj^(t-k)i}  t»l,..,n 


(14) 


The  3-color  numbering  scheme  introduced  here  causes 
also  the  stripes  of  the  matrices  obtained  from  FD^  and  FE^ 
discretizations  to  be  non  overlapping  (as  indicated  in  Tatble 
1) .  However  for  FEg  and  FE^  discretizations,  this  numbering 
does  not  spread  the  stripes  enough  and  overlap  may  still 
occur.  A  5-color  numbering  scheme  is  needed  In  this  case  to 
guarantee  non  overlapping  stripes.  The  emalysis  of  the  5- 
color  scheme  is  similar  to  that  discussed  in  this  section. 


Although  the  property  of  non-overlapping  stripes  is 
important  for  the  efficient  operation  of  MAT/VEC,  the 
multi-color  numbering  scheme  has  an  additional  advantage 
over  the  regular  scheme.  Namely,  It  produces  matrices  in 
which  the  stripes  are  uniformly  spread,  thus  minimizing  the 
maximum  separation  between  stripes.  This  is  explained  in 
details  in  the  next  section. 


i 

.  K 

3 


MAXIMUM 


BETWEEN  STRIPES 


It  was  shown  in  [6]  that  the  multiplication  of  a 

striped  matrix  by  a  vector  may  be  performed  on  the  network 

MAT/VEC  efficiently,  only  if  each  communication  link 

directed  from  a  coll  k  to  the  previous  cell  k-1  may  buffer 

at  least  d  .  data  items,  where  d  .  is  a  measure  of  the 
mm  min 

maximum  separation  between  the  stripes  of  the  matrix.  More 

specifically,  if  the  stripes  of  the  matrix  are  full,  then 

d  .  may  be  estimated  from 
min 


On  the  other  hand,  if  the  stripes  of  the  matrix  are  not 
full,  then 


where  xP^,  t»l,...,n,  are  the  x-^stream  data 
corresponding  to  the  execution  of  MAT/VEC. 


(15) 


profiles 


In  order  to  observe  the  effect  of  the  node  numbering 
scheme  on  the  separation  between  stripes,  we  consider  a  rec- 
teuigular  grid  Mg,  with  H-3h-l  horizontal  lines  and  W  verti¬ 
cal  lines,  and  we  assume  that  A  is  the  stiffness  matrix  that 
results  from  FE^  discretization  on  Mg.  If  regular  node 
numbering  is  applied,  then  the  functions  (4)  may  be  used  to 
construct  nine  parallel  full  stripes  S^,  k»-4,...,4,  that 
satisfy  for  any  t 
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-  22  - 

This  gives  d^^^<H-2»3 (h-1) -  On  the  other  hamd,  if  3-color 
numbering  is  applied,  then  the  functions  (7)  may  be  used  to 
produce  eleven  parallel  full  stripes  S^,  k>-5,...,5,  that 
satisfy  for  any  t 


J  h-1  if  k— 5, -2, 1,4 

‘^k+1^^^  “  ”lh  if  k— 4, -3, -1,0, 2, 3 


That  is  d^.  <h.  Hence,  although  the  multi-color  numbering 
min  ^ 

produces  a  matrix  with  a  larger  band  width  (5h-l  instead  of 

3h-f  1) ,  the  stripes  are  spread  within  the  bauid  almost  uni-  y! 

formly,  thus  decreasing  the  maximum  separation  between  the 

stripes  from  3 (h-1)  to  h.  S 


The  natural  question  to  ask  is:  does  d  .  remains 

min 

unchanged  if  Mg  is  pierced  and  the  nodes  in  the  resulting 
pierced  domain  are  renumbered?.  More  specifically,  if  we 
consider  the  stripe  structure  discussed  in  Section  4  auid 
given  by  (12),  can  we  construct  some  profile  functions  that 
correspond  to  the  fronts  (14)  such  that  the  maximum  in  (15) 
is  h?.  A  positive  answer  to  this  question  may  be  provided 
by  considering  the  following  profile  functions 

xP^.(k)  -  v(Next(p^(v”^(t-k))))  t-l,...,n  (16) 

where  the  function  Next  is  defined  in  Definition  2. 
Clearly,  if  Oj^(t-k)  i,  then  from  (12),  Pj^( vr^(t-k) )  €  M^^, 
and  hence  Next(Pj^(v  ^(t-k)))  •  Pj^( v~^(t-k) ) ,  v^ich  by  (12) 
and  (16)  gives 

xP^(k)  -  Oj^(t-k)  if  aj^(t-k)  i  (17) 


>>; 


That  is,  the  knots  of  the  profile  (16)  coincide  with  the 
fronts  (14).  It  is  also  straight  forward  to  check  that 


XP^(k)  ‘ 

< 

xP^(k+l) 

"k+i 

xP^(k+l) 

otherwise 

XPt(k)  1 

< 

xPt^.i(k) 

if  aj^(t-k)  i 

xPt^l(k) 

otherwise 

which  are  necessary  conditions  for  profile  functions  [6]. 


In  order  to  estimate  the  maximum  in  (15),  we  substitute 
( 16 )  in  ( 15 )  and  get 


dmin  “  nvax{v(Next(Pj^_j_j^(X))  )  -  v (Next (Pj^( X ) ) ) }  (18) 

k,  t 


where  X»=v~^(t)  and 

X*v"^(t-1) 

« 

Now, 

let  (/> 

“  max{)x  1 

^=*Pk+i(>')  and 

That  is 

<P 

is  the 

first 

node  before 

Pk^l(X)  that  is  in 

(take 

if 

no  such 

H  does  exist) . 

Given  that  Next(Pj^^j^(X) )  is 

the 

first 

node 

in  after 

Pk+i(^)'  we  get  v(Next(Pj^^j^(X) ) ) 

- 

v(^)+l. 

Hence 

v(Next(p.  , (X)))-v(Next(p.  (X)))  -  v(0)-v(Next(p,  (X) ) )+l 
K+l  K  K 


But  Next(Pj^(X)  )ipj^(X)  .  This  gives 

^  -  Next(Pj^(X))  4  -  Pj^(X) 


(20) 


where  we  used  Pj^^j^(X)-pj^(X)«h,  which  may  be  verified  from 
(7).  Also,  from  (7),  X>X  implies  that  Pj^(X)<  P)^(^)f  which 
together  with  the  property  (3.b)  of  the  renumbering  function 


V  and  (20)  gives 


v(0)  -  v(Next(p^(X) ) )  <  h  (21) 

Prom  (21)  and  (19)  in  (18),  we  finally  get  That 

is,  for  matrices  which  are  generated  from  FE^  discretiza¬ 
tions  on  pierced  rectangular  grids  with  hight  H  and  3-color 

tJ 

node  numbering,  a  buffer  capacity  of  ^  is  sufficient  to 
ensure  that  MAT/VEC  terminates  execution  in  n  global  cycles. 


QP  MAT/VEC  APPLrED  TO 


MATRICES 


As  defined  in  [6],  the  global  cycle  of  MAT/VEC  consists 
of  a  communication  phase  and  a  processing  phase.  The  time 
for  the  processing  phase  Is  the  time  needed  to  complete  a 
floating  point  mult Iply/add,  which  Is  constant  for  a  given 
architecture  of  the  cells  of  MAT/VEC.  On  the  other  hand, 
the  time  for  the  communication  phase  depends  on  the  stripe 
structure  of  the  matrix.  More  specifically,  given  the  stripe 
structure  of  the  Input  matrix  and  a  corresponding  data  pro¬ 
file,  the  time  for  the  communication  phase  of  the  t^^  global 
cycle,  l«t«n,  is  the  time  for  data  transmission,  where 


-  max{xP^(k)-xP^_j^(k)) 
k 


(22) 


Assuming  that  the  time  needed  to  complete  a 
multlply/add  operation  is  r^,  and  the  time  needed  to 
transmit  a  single  data  Item  between  two  cells  is  r^,  then 
the  total  execution  time  of  MAT/VEC  Is 


m  c 


n 


(23) 


For  stiffness  matrices  resulting  from  3 -color  numbering 
and  PE^  discretizations,  It  Is  easy  to  show  that  the  profile 
functions  (16)  lead  to 


t*l, . . . ,n 


However,  the  actual  value  of  is  usually  much  smaller  thaui 
h,  as  shown  by  the  following  example. 


Ila  are  needed  and  the 
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obvious.  Also,  the  number  of  cells  n  in  MAT/VEC  is  indepen¬ 


dent  of  the  size  of  the  grid,  while  the  number  of  cells  used 


in  the  systolic  approach  [5],  namely  b,  depends  on  the  size 


of  the  grid  (usually,  b  -  0(\Jn)). 


In  order  to  observe  the  effect  of  the  numbering  scheme. 


we  also  consider  the  matrix  corresponding  to  the  column-wise 


numbering  of  the  grid  of  Fig  7(a).  This  matrix  has  a  hand- 


width  b^  «  35  and  may  be  covered  by  nine  stripes.  However, 


the  stripes  are  not  "strictly  non  overlapping",  amd  the  con¬ 


struction  of  the  computation  fronts  (see  [6])  shows  that  326 


global  cycles  are  needed  for  the  completion  of  the  execution 


of  MAT/VEC.  Hence,  the  size  of  MAT/VEC  is  smaller  for  regu¬ 


lar  numbering  than  in  the  3-color  numbering  (9  cells  instead 


of  11  cells),  but  execution  is  slower  (326  global  cycles 


instead  of  174  cycles). 


Clearly,  general  results,  of  the  type  proved  in  the 


previous  sections,  may  only  be  obtained  for  grids  that  are. 


in  some  sense,  regular.  However,  given  auny  sparse  matrix. 


and  in  particular  a  stiffness  matrix,  a  stripe  structure  may 


be  constructed  for  the  matrix  auid  the  number  of  computation 


fronts  needed  for  the  execution  of  MAT/VEC  may  be  estimated. 


EXAMPLE  2: 


Highly  irregular  grids  may  be  obtained  If  triaungular 


elements  are  used.  Consider,  for  example,  the  two  grids 


shown  in  Fig  8  that  are  extracted  from  [9]. The  Cuthlll-Mckee 
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Fig  8  -  Irzegular  grids 


numbering  scheme  [2]  is  used  for  both  grids  starting  from 


the  encircled  nodes.  The  stiffness  matrix  corresponding  to 


the  grid  of  Pig  8(a)  Is  of  order  145  and  hats  a  batnd-width 


25.  The  minimum  number  of  stripes  that  may  cover  the  matrix 


is  9  (overlapping)  and  the  number  of  computation  fronts  is 


found  to  be  283  fronts.  For  the  grid  of  Fig  8(b),  the  order 


of  the  matrix  is  289  atnd  the  bautdwidth  is  49.  The  number  of 


stripes  is  found  to  be  13  atnd  the  corresponding  number  of 


computation  fronts  is  533.  By  comparision  with  systolic  mul¬ 


tiplication,  in  which  all  trivial  operations  are  performed. 


it  is  clear  that  the  orgatnlzatlon  of  the  non-zero  elements 


into  a  stripe  structure,  which  is  independent  of  the  size  of 


the  problem,  reduces  the  hardware  needed  for  the  completion 


of  the  multiplication,  without  slowing  down  execution. 


Finally,  we  note  that  the  grids  in  Fig  8  are 


constructed  without  emy  consideration  for  the  regularity  of 
the  stripe  structure.  More  specifically,  the  same  domains 
may  be  easily  covered  by  grids  that  have  the  Seune  element- 
density  distribution  of  the  given  grids,  but  that  are  iso¬ 
morphic  to  some  pierced  rectangular  grids.  The  matrices 
generated  from  these  grids  should  obey  the  results  of  Sec¬ 


tion  3  euid  4. 


It  is  shown  that  the  number  of  stripes  v  in  the  stripe 
structure  of  a  stiffness  matrix  is  independent  of  the  size 
of  the  problem,  and  is  much  smaller  thaun  the  band-width  of 
the  matrix.  For  pierced  rectangular  domains,  the  stripe 
count  ir  may  be  estimated  analytically  and  the  stripe  struc¬ 
ture  of  the  matrix  may  be  constructed  from  the  finite  ele¬ 
ment  grid. 

The  multicolor  node  numbering  presented  in  this  paper 
has  two  favorable  effects  on  the  resulting  matrix:  First,  it 
produces  non-overlapping  stripes,  which  prevents  any  data 
conflict  during  the  execution  of  M^T/VEC,  and  second,  it 
distributes  the  stripes  uniformly,  which  reduces  the  maximum 
separation  between  stripes  and  thus  minimizes  the  number  of 
buffers  needed  in  MAT/VEC. 

In  brief,  the  construction  of  stripe  structures  for 
stiffness  matrices  allows  for  the  efficient  utilization  of 
VLSI  networks.  Moreover,  the  number  of  cells  in  such  net- 
vrorks  is  determined  by  the  stripe  count  tr,  which  is  indepen¬ 
dent  of  the  size  of  the  problem. 


REFERENCES 


[1]  L.  Adams,  "Iterative  Algorithms  for  Large  Sparse 
Linear  Systems  on  Parallel  Computers,"  Ph.O.  Thesis, 
Univ.  of  Virginia  (October  1982). 


[2]  E.  Cuthill  and  J.  Mckee,  "Reducing  the  Beuidwidth  of 
Sparse  Symmetric  Matrices,"  Proceedings  of  the  ACM 
National  Conf,  New-York  (1969),  pp.  157-172. 


[3]  S.  Eisenstat,  M.  Gursky,  M.  Schultz  and  A.  Sherman, 
"Yale  Sparse  Matrix  Package,"  Tech.  Report  112,114, 
Dept,  of  Computer  Science,  Yale  University  (1977). 


[4]  A.  George  and  J.  Liu,  "Computer  Solutions  of  Large 
Sparse  Positive  Definite  Systems,"  Prentice-Hall 
series  in  Computational  Mathematics  (1981). 


[5]  H.  T.  Rung  and  C.  E.  Leiserson,  "Systolic  Arrays  for 
VLSI,"  in  Introduction  to  VLSI  Systems  (1980).  Ed.  by 
C.  Mead  and  L.  Conway,  Addison -Wes ley,  Reauiing,  Mass. 


[6]  R.  Melhem,  "Parallel  Solution  of  Linear  Systems  with 
Striped  Sparse  Matrices;  Part  1:  VLSI  Networks  for 
Striped  Matrices,"  Tech.  Report.  ICMA-86-91  (Jau\uary 
1986) . 


[7]  D.  O'leary,  "Ordering  Schemes  for  Parallel  Processing 
of  Certain  Mesh  Problems,"  SIAM  J.  on  Sci.  and  Stat. 
Computing,  Vol  5-3  (Sept  1984),  pp.  620-632. 


[8]  Y.  Saad  and  M.  Schultz,  "Parallel  Implemetations  of 
Preconditioned  Conjugate  Gradient  Methods,"  Tech. 
Report  YALEU/DCS/RR425,  Dept,  of  Computer  Science,  Yale 
University  (Oct.  1985). 


[9]  O.  C.  Zienkiewicz,  "The  Finite  Element  Method," 
McGraw-Hill,  (1979). 


